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Abstract 

For a non-stationary or non-ergodic marked point process (MPP) on R d , the 
definition of averages becomes ambiguous as the process might have a different 
stochastic behavior in different realizations (non-ergodicity) or in different 
areas of the observation window (non-stationarity). We investigate different 
definitions for the moments, including a new hierarchical definition for non- 
ergodic MPPs, and embed them into a family of weighted mean marks. We 
point out examples of application in which different weighted mean marks all 
have a sensible meaning. Further, asymptotic properties of the corresponding 
estimators are investigated as well as optimal weighting procedures. 
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1. Introduction 

Marked point processes (MPPs) provide an adequate framework for modeling irreg- 
ularly scattered events in space or time in that they incorporate the joint distribution 
of the observed values and the point locations (e.g., [TJ [8J [14l [T9J [20l [22]). Due to 
the variety of possible forms of dependence between marks and locations in an MPP 
framework, already the notion of the mean, which is usually considered as being the 
simplest summary statistic, rises tantalizing and challenging questions. 

An introductory example for the type of MPP averages being considered within this 
paper is the trading process in financial markets. Transactions of assets are typically 
characterized by the two quantities price and volume; a benchmark quantity that is 
of major interest especially for institutional investors is the so-called volume- weighted 
average price (VWAP) (e.g., [31 US])- The VWAP of n transactions with prices pi and 
traded volumes Vi, i = 1, . . . , n, is defined as pvwap = J2(Pi v i)/ v i- 
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We embed this example in the following general MPP framework: We consider sta- 
tionary MPPs on R d of the form 

$ = {(ti,yi,Zi) : i G N}, 

where U G R d is the point location, yi G R is the first mark and Zi G [0, oo) is a 
second mark of the ith point of <£. Let $ g = {t : (t, y, z) G $} denote the ground 
process of point locations of $ and let us denote the marks at a location t G $ g by 
and The non- negativity assumption on the z-component simplifies technical 
assumptions when employing this mark component as weights for averages of the first 
mark component y(t) or f(y(t)) for some function / : R — > R. In intuitive notation we 
write the corresponding weighted mean as 

= E[z(t)f(y(t))\t£$ s ], (1) 

where we assume that the z-component is normalized such that E[z(t) 1 1 G $ g ] = f . 
Here, the conditioning on "t G $ g " is understood in the sense of the Palm mark 
distribution. Since the weights z(t) are provided by the MPP itself and may depend 
on both the marks y(t) and the point locations t G 'I'g;, we refer to /lA as intrinsically 

weighted mean mark of <I>. The formal definition of fiP and related quantities will be 
given at the beginning of Section [2J 

When a system of randomly distributed objects is modeled by means of MPPs, there 
can exist different sensible choices of intrinsic weights z(t) leading to different weighted 
mean marks that are relevant for one and the same process, but for different statistical 
questions: 

• Average height of trees: Consider n forests of about equal size, each of which is 
sampled on an area with fixed size and shape. Then the unweighted average of the 
height of all trees provides a measure of the entire timber stand, which is relevant 
for forest inventory applications. This amounts to z(t) = 1 in pp. Additionally, 
the average height of a typical forest (as opposed to a typical tree) might be 
of interest, independently of how dense the trees occur in the different forests. 
Then, a nested definition of mean seems to be adequate where we first average 
within each forest and then between all forests. This is equivalent to using a 
weighted average over all trees with z(t) being proportional to the inverse of the 
number of trees in the forest that location t belongs to. 

• Density of insects on plants, cf. fJjj: Consider n plants and a population of insects 
distributed over the plants. Let ki, i = 1, . . . ,n, be the number of insects on the 
zth plant. In this set-up there are different well-established definitions of density 
referring to different ecological effects. The ordinary density of insects, also 
called resource-weighted density, is (fci + . . . + k n )/n and quantifies the average 
availability of resources. In contrast, the organism-weighted density is the density 
that an average insect experiences. Each individual on plant i experiences a 
density of ki insects per plant, i.e., the organism-weighted density is (k\ + . . . + 
k^)/(ki+. . -+k n ). In MPP notation, each insect is represented by a point, marked 
by the total number of insects on the plant on which the insect is located. Then 
the organism- weighted density corresponds to the ordinary mean mark (z(t) = 1), 
whereas the resource- weighted density is the average of all plant-wise averages of 
the marks, i.e., z(t) — (nki) -1 &i ^ ^ belongs to plant i. 
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• Sampling of continuous- space processes: Measurements of continuous-space or 
continuous-time processes usually aim at estimating or predicting the underlying 
process and the mean of interest is therefore the spatial or temporal mean 
over the whole domain of the process. Since measurement locations are not 
necessarily independent of the underlying process, knowledge of the pattern 
of point locations might already provide information about the values of the 
process. Such a situation is commonly referred to as biased or preferential 
sampling and different weighting approaches exist to correct for this form of 
biases (e.g., [H]). Although most statistical methods only use stationarity, 
ergodicity is often implicitly assumed. In case of non-ergodicity, which means that 
different realizations can have a different stochastic behavior, we are faced with 
an additional dimension of biasedness: Within each ergodic subclass, the pattern 
of point locations can be independent of the underlying process, while there 
might be a strong dependence between the pattern of measurement locations and 
the process itself if multiple realizations are considered. For a simple example, 
consider a Gaussian random field with a random mean m combined with a Poisson 
point process of measurement locations whose intensity of points is a function 
of m. 



While ergodicity of MPPs is necessary for a straightforward interpretation of the 
mark distribution as the distribution of a typical point and, at least implicitly, is 
required by many applications for consistent estimation, in this paper, we investigate 
the behavior of moment-based summary statistics in case of non-ergodic MPPs and 
intend to point out problems of ambiguity in this context. When the different forests 
and plants in the above examples are perceived as a set of MPP realizations and exhibit 
different ecological characteristics, non-ergodicity has to be included. Examples for 
non-ergodic MPPs that evolve in time can easily be found in the financial world: For 
subsequent days of asset trading, the process of executed transactions can be considered 
as different realizations of a possibly non-ergodic MPP. To treat non-ergodic MPPs 
adequately, we propose intrinsically weighted mean marks as a special case of ([I]) in 
which the weights are constant within each ergodicity class but allow for compensating 
for differences between the different ergodicity classes. A direct application of the 
theory developed within this paper is [17] . in which interaction effects within high- 
frequency financial data are investigated via MPP methods. 

The remainder of this article is organized as follows: In Section [2] we recall and 
generalize moment-based characteristics for MPPs which also form the central tool for 
the analysis of interactions in MPPs. We study their behavior and interpretation 
for non-ergodic processes and, following the idea of the above examples, propose 
alternative definitions of moment-based summary statistics in Section [3] Different 
estimators for the above characteristics and their asymptotic properties are discussed 
in Section |4] the paper closes with a comparison of the point process set-up with 
estimation of continuous-space processes, which typically occur within geostatistical 
applications. The appendix reviews basic results from ergodic theory and contains 
some of the proofs of Section |4j 
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2. MPP moment-measures and measurement of interaction effects 

Throughout the paper $ = {(tj, j/j, z{) : i € N} is a stationary and simple marked 
point process on R d with marks (y(U), z(U)) — (yi,Zi) e M x [0, oo), and <f> g = 
{t : (t,y,z) € $} is its ground process of point locations. In particular, the point 
configuration <f> g is locally finite. For the general theory of point processes, the reader 
is referred to [BJ [7J [M], for example. Let us remark that the following definitions of 
MPP statistics can directly be generalized to MPPs on Polish spaces whose marks are 
also in a Polish space. 

One of the most basic mark summary statistic is the weighted mean mark 
which wc introduced in ((T|) as a conditional mean, conditional on the event {t £ & g }- 
since for fixed t G R d , this is a zero-probability event, the classical formal definition is 



for any Borel set B C R d with \B\ > 0. Here we implicitly exclude the degenerate case 
z(t) = 0. Due to the stationarity of $, this definition does not depend on the choice 
of B, 

Proposition 2.1. Both definitions of /A 1 ' , (HJ and coincide. 

Proof. The assertion follows from standard arguments of MPP theory [Jj chap. 13]. 

The most relevant example of / in practical application is f(y) = y n for n = 
1,2,... Then, if z(t) = 1 for t G $ g , simply represents the n-th moment of the 
(Palm) mark distribution. Note that in case the MPP represents measurements of an 
underlying continuous process, the mean mark can substantially differ from the mean 
of the underlying process due to stochastic dependence between the sampling locations 
and the process itself. 

While the above statistic reflects (average) properties of single points, second- 
order characteristics (in intuitive notation E[/(y(ti), y(t 2 )) \ ti,t 2 G 3> g ,ti ^ t 2 ]) pro- 
vide a framework to investigate dependency structures within MPPs. We use the 
superscripts W and ( 2 ) to indicate whether first- or second-order measures are meant. 

Definition 2.1. For any non-negative function / onlxl, we define a cr-finite measure 
on R d x R d by 

af\c)=E z 1 f(y 1 ,y 2 )l c ((t 1 ,t 2 )), C G B(R d x R d ), (3) 

(ti,j/i,zi),(t 2 ,j;2,Z2)e < I > 

which we call weighted second moment measure. Here, indicates that the sum runs 
over all pairs of points with (it,2/i) ^ (t 2 ,y 2 ). 

With the notation 

C(B n = < f{( i i'* 2 ) :< i eB ' <2e *i+ / }' d=1 > 
\{(*i,*a) : h G B, t 2 G h + {x e R d : \\x\\ G /}}, d > 1, 

C{tJ)=C{[0,t],I), 

C(I)=C ([0,1], I), 



E£( t>I ,,,) 6 »*/(»)Mt) 
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for B e B{R d ), t £ R d , I e 

af{C{I)), /6B(M), (4) 

defines a tr-finite measure on R. Well-known examples of second-order mark character- 
istics for stationary and isotropic MPPs are Cressie's mark variogram and covariance 
function [5], Stoyan's fc mm -function |23) . and Isham's mark correlation function |12j . 
which can all be expressed in terms of ([3]) or (01 with a constant z-component. [2"T] 
provides a unifying notation for the above characteristics and further introduces new 
functions, E and V, where E(r) and V(r) represent the mean and variance of a 
mark, respectively, given that there exists a further point at distance r > 0. For 
the one-dimensional case, e.g., for temporal processes, [16] extend those characteristics 
to the non-isotropic set-up, where a negative value of r means that the point that is 
conditioned on is in the past. The above second-order characteristics only involve the 
three functions / (1/1,3/2) = ym, f(yuVa) = Vi and f(y lt y 2 ) = y\. 

Definition 2.2. (cf. For a general non-negative function / on R x R, we define 

(2) af(C{I)) 

if aW{C{I)) > 0. Here, is short notation for af } with / = 1. We call ^ f 2) the 
(weighted) second-order mean mark. 

In the following, we always assume that I is chosen such that cS 2 \C{I)) > 0. Note 

that the distinction between d = 1 and d > 1 in the definition of the set C(B,I) 

(2) 

allows to capture a possibly anisotropic behavior of in the one-dimensional case. 
In particular, 



af (C(J)) = 



J £>tt 1 ,y 1 ,z 1 ),(t 2 ,y2,z2)e<s>, tie [o,i] z i/(2/i)2/2)lt 2 -tieJj d — 1 



£)(t 1 ,vi,«i),(t2,»2,*2)e*, tie [0,1] z i/(2/i'y2)l||t 2 -t 1 ||e/, d>l. 



For higher dimensions, it is also possible to assign different directions of isotropy, but 

(2) 

the technical burden increases considerably as [i\ will not be a function of a scalar 

argument anymore. For further notational convenience, we assume that the derivative 

(2) 

of a j w.r.t. the Lebesgue measure exists, which is then referred to as product density 
(2) 

and denoted by pj . 

(2) (2) 

Due to the stationarity of $, we have p^ (£1,^2) = pf (0,^2 — t\) for almost all 

(ti,t 2 ) G R 2d and hence af } (C) = f c pf\o,h 2 - hi)d(hi x h 2 ), C 6 B(R d x R d ). 

c (2) (2) 
Let pj (r), r 6 R, denote the derivative of (C(-)) w.r.t. the one- dimensional 

Lebesgue measure. Obviously, a^ 2 \c(-)) is dominated by a^ 2 ^(C(-)), which ensures 

that the limit of pj (I) for \I\ — > exists and can be expressed in terms of Radon- 
Nikodym derivatives. For r^0 we define 



(2), , daf\C(-)) 
M/ ( r ) - 



9a( 2 )(C(0) 



Pi W 
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C (2) (2) 

Note that for d — 1, we have py (r) — pj(0 5 r). With a slight abuse of notation, 
we refer to both definitions ([S]) and © as ^/ ■ For r / and / only depending 

( 2) 

on its first argument, /A '(r) can be interpreted as the (weighted) expectation of a 
mark at location t subject to the conditioning that $ has a point at location t and 

(2) 

at location t + rei, i.e., /A (r) = E[z(i)/(y(i)) |t,i + rei 6 <& 3 ], where ei denotes the 

vector (1,0,..., 0) T £ M d . For /A 2 ^(J), this interpretation becomes slightly ambiguous: 
Considering an event at time t, there may be multiple other points located within the 
set t + I and in case that interactions of higher order are present, these will be reflected 
by the second-order statistic fif (I) as well. More precisely, by the definitions in (JS) 
and P, 

pf(I) = a^\C{I))- 1 j$\r) da^(C(r)), (7) 

(2) (2) 

i.e., /A ; (I) is a weighted average of conditional expectations /A ' (r) with weights being- 
proportional to the expected number of pairs of points with distance dr. 

Remark 2.1. (a) The extension to moment measures of higher order is straight- 
forward and allows to condition on arbitrary point constellations. In practice, 
however, mostly first- and second-order statistics are considered. 

(b) The non-negativity condition on / can be weakened by considering the restriction of 

(2) (2) 

jUj (■) to some bounded set J e B(R). Then it is sufficient for / that a h (C(J)) < 
oo is satisfied for h = f + = max{/, 0} or for h = /_ = — min{/, 0}. 

(c) Another generalization allows to include further conditioning on the marks. For 
/cond a non-negative function onlxK we consider 



Vf, /cond " (2) ~ (2) m ■ 



Choosing / con d to be an indicator function /cond (2/1, 2/2) = 1a (2/1) Is (2/2) conditions 
the marks on the events A and £?, respectively. 

f 2) 

Remark 2.2. For a > 1, is a function of the Euclidean distance between two 

(2) 

points, whereas for a = 1, /Uy is a function of the signed distance. In the latter case, 

(2) 

/A (■) is in general not symmetric: Consider a temporal process consisting of pairs of 

points (ti, £2) with t\ < t<i and with small intra- but large inter-pair distances. Assume 

that the marks of different pairs are stochastically independent and that for each pair 

(2) (2) 

of points, 7(2/1,2/2) > / (2/2, 2/1) holds. Then /A (r) > /lA (— r) holds for all r > that 
are small enough and that can occur as intra-pair distances. 

(i) 

For notational convenience, we will write (j,y to indicate that a statement is valid 
for and jj!f^ ■ 
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3. New moment measures for non-ergodic MPPs 

Ergodicity makes spatial averages over suitably increasing observation windows of 
a single realization converge to the corresponding expectation over the state space: 



\W\ 



X(T X $) dx ^ E(X($)), for \W\ -> oo suitably, 



li- 



fer any integrable function X on the space of all locally finite counting measures. 
Here, T x denotes the shift of the whole random point pattern $ by x £ R d . In essence, 
ergodicity enables consistent estimation of MPP moment measures by observing a 
single realization on a suitably increasing domain. In this section, though, we consider 
the opposite situation, namely where $ is a non-ergodic process. 

The following proposition directly relates to the fact that a non-ergodic MPP can be 
seen as hierarchical model, which, in a first step, draws an ergodic source of randomness 
out of which the final realization is drawn in a second step. 

Proposition 3.1. Let $ be a non-ergodic MPP with probability law P. By Mo and 

A4q we denote the space of all locally finite counting measures on R. d X i X [0, oo) and 
the usual a -algebra, respectively. (See AvvendixVS\ for more details.) Then 



.00 



(i) 



(B) 



E c 



4!ii Q (-)4?Q^(-)) 



a< 3 >(<7(.)) 



(9) 



where Q ~ A is a random variable with values in the space Pcrg of all ergodic MPP 
probability laws, distributed according to some probability measure X, such that P(M) = 
f Vm Q*{M)X{dQ*), M e Mo. Ifii ( f 2) is evaluated for a fixed distance rel, a^{C(r)) 

C (2) 

has to be replaced by p 1 (r) in ©. 



Proof. The ergodic decomposition theorem (cf. Theorem IA.2[) guarantees the exis- 
tence and uniqueness of a decomposition P(-) = J v Q*(-)X(dQ*) and a corresponding 
mixing random variable Q ~ A. Conditioning $ on Q, we can decompose the moment 



measures orp and obtain 



dE Q a% {Q (C(-)) 



pf- (2) (0,r) 



dE Q a% [Q {C(-))/du{-) 



daW{C{-))/dv{-) 



Pi (0,r) 



where v denotes the Lebesgue measure. For (1) and /A , the decomposition is 
analogous. 

Example 3.1. The so-called log-Gaussian Cox process [18] is ergodic if and only if the 
underlying stationary Gaussian random field Z is ergodic. A sufficient condition for Z 
being ergodic is that the covariance function decays to zero. Amongst others, [S] and 
[20] use log-Gaussian Cox processes, combined with an intensity-dependent marking, 
as parametric models for preferential sampling applications. 
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Proposition 13 . 1 1 shows that in case of non-ergodicity, fJp is an average of its ergodic 
subclasses counterparts, in which each class Q* is implicitly weighted by the respective 
intensity <*$?\q=q*- H all ergodic subprocesses = Q*] have the same intensity 

measure, the weights cancel out and we have fiy = Eg/x^^. Since in the general 
case, a single ergodicity class with low probability may exhibit a large value of a^Q = Q„ 

and thus drive the value of (Jr* , the demand for a new characteristic p,y arises 
naturally, that summarizes the properties of all ergodicity classes irrespectively of how 
the processes of point locations differ between the different ergodicity classes. We meet 
these requirements by a definition that excludes the implicit weighting proportional to 
the «th order intensities: 

Definition 3.1. Let A and Q be the ergodic decomposition mixture measure and 
mixture variable, respectively, of <£>, and let IEq | | < 00 ■ Then we call 

fif = E Q »% {Q = f »% lQ=Q , X(dQ*). (10) 
the (equally-weighted) average ith-order mean mark of <f>. 

Relating to the introductory forest example, the classical definition of the mean mark 
in © corresponds to the average height of all trees, irrespectively of differences w.r.t. 
the tree densities between the different forests, while the new definition in (|10p refers 
to the average height of a typical forest. 

Remark 3.1. Comparing the new definition with ([9]) yields that p,y coincides with 
fj,y if c^iq is A-a.s. constant. This is particularly the case if <£> is ergodic. 

Lemma 3.1. For any I 6 B(M.) we have 



If, for X-almost all measures Q* , M/ 2 $|Q = Q»( r ) * s uniformly bounded by some positive 
constant c(Q*) and Eqc(Q) < oo, for I G £>(R) and r£l, we have 



,5* £?>(/) = M?>(r) 



Proof. The first assertion follows directly from applying the representation {7} to 

(2) (2) 

the ergodic subprocesses {&\Q = Q*]. Since lim/_ > .{ r } /i^ (/) = ^ (r) by construction, 
the second assertion is merely an application of Lebesgue's dominated convergence 
theorem. 

From Lemma 13.11 we see that the nested conditional mean ; (r) is a Radon- 
Nikodym derivative of a^\c(-)) w.r.t. a^ 2 \C{-)) if and only if the expectation of 

(2) (2) (2) 

a $ I q(^v ))/•*/ $|Qw factorizes. This contrasts the ordinary conditional mean /i^ (r), 
which is already defined as a Radon-Nikodym derivative of ai (C(-)) w.r.t. a^\C{-)). 



Intrinsically Weighted Means of Marked Point Processes 



9 



The ergodic decomposition and an analog to Definition 13.11 can be applied to any 
expectation-based functional of an MPP including the Palm mark distribution itself. 
While the classical definition of the mean mark represents a typical point, irrespectively 
of the different ergodicity classes, the two-stage-expectation pk' refers to the mean of a 
typical realization. We provide more details on the meaning of the differences between 
fjSp and Jxf 1 and between different estimators in the next section. 

4. Estimation principles for the new MPP moment-measures 

4.1. The ergodic case 

For ergodic processes the pointwise ergodic theorem for MPPs (Proposition lA.il 
in the Appendix) yields that 



E 



z 1 f(y 1 ,y 2 )l itl: t 2 )ec(i) 

(*l,!/l,Zl),(t2, 2/2,22)6* 



= lim 



,-d 



X! z 1 f{y 1 ,y 2 )l {tl .t 2 )ec(ni,i) 



for almost all realizations (p of $, which builds the basis for the estimators being 
discussed in this section. For readability reasons, and since we will be only dealing with 
second-order statistics from now on, we drop the superscript ^ in all the estimators 

t ( 2 ) 

Applying the standard estimator for MPP moment measures to a realization of <f> 
observed on the set [0,T], T 6 (0, oo) d , we obtain 



A/(J,*,T) 



&f(I,*, T) 
ai(/,$, T) : 



(11) 



where a f (I, $, T) = Y.tt 1 , Vl ,z 1 ),(t a ,v2,z a )e* z i/»i> IftO^ti.taJeCCr,/}. 

(2) 

Lemma 4.1. //<£> is ergodic, /}/(/, <£>,T) is consistent for (ij (I) . Here, "T — > 00" is 
understood componentwise. If <& is non-ergodic, /*/(/, $,T) is consistent if and only if 
M/ 2 0|q=q*(-D * s constant w.r.t. Q* . 

Proof. By Proposition lA.il the tuple consisting of the numerator and the denomi- 
nator of (TTTj) , each normalized by the volume of [0,T], converges a.s. to the vector 

(a^ (C(I)), a^(C(I))) if $ is ergodic. The first assertion thus follows from the 

(2) 

continuous mapping theorem. In the non-ergodic case, clearly only M/$|q=q*(-0 can 
be estimated consistently for Q* being the respective ergodicity class. Though, if 

(2) (2) (2) 

A 4 / *|q=q*(-0 i s constant w.r.t. (J* we have fi f (I) = fi^ 4|Q=Q*vO for any Q* 6 P^g. 



To establish asymptotic normality of $,T), we introduce some idealized as- 
sumptions. In particular, we assume stochastic independence between the point loca- 
tions and the marks of the MPP. For simplicity, we restrict to the case where / only 
depends on its first argument and the MPP is a process on R. 
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Condition 4.1. (m-dependent Random Field Model.) Let $ be a stationary unmarked 
point process on M, for which neighboring points have some minimum distance do > 0. 
Let {Y(t) : t G M} be an independent stationary process with finite second moments 
and a covariance function C that has finite range, i.e., C(h) = for all \h\ > ho for 
some ho > 0. Then, with m ~ [do/ ho], we say that an MPP <£> is an m-dependcnt 

Random Field Model, if <$> = {(ti,Y(ti), 1) |ij G <!}. 

The following theorem transfers a central limit theorem (CLT) for arrays of m-dependent 
random variables to the MPP context. It also covers a thinning of the MPP in which 
the threshold increases with the observation window. The result allows to derive 
asymptotically exact confidence intervals for the estimator of /iy (!) and is applied 
in [T7] in the context of extreme value analysis for MPPs. 

Theorem 4.1. (CLT for m-dependent Random Field Models.) Let <I> be an ergodic 
MPP that satisfies Condition ^. 1\ For f : R — > [0, oo) and u > 0, let f Ul / CO nd.« : K — > 
[0, oo) be given by f u (y) = (f(y) - u)+ = (f(y) - u)lf( y ) >u and f cori d,u{y) = 1 f(v)>u- 
Let 



a% (I, T) = (/«(yi) - M/^/cond.J 7 )) ' /«>nd,u(z/i) • l( tl ,t 2 )ec(T,/) 



(ti,yi),{t2,v2)e<s> 

,(2) 



be a centered version of &f n (I, T), where M/ / d (-0 * s defined as in ([5]). Let 
(ut)t>o ^ e a family of non-negative, non- decreasing numbers such that the following 
conditions are satisfied: 

Uoo — hm ut 6 [0, oo] exists, 

T— >oo 

lim E[f UT (Y(0)Y\f(Y(0))>u T ] < oo (i = 1,...,4), 
Then, for I G S(R) and T — > oo, we /lave 



A Af(0, .■ 



where 



s Uaa = lirn^ [(KtT)- 1 Var [d} UT (7, $,T)] } , 
X u =E$ [d /cond u (/,$,l)] , w>0. 

The proof is given in Appendix [B] Note that the asymptotic variance s Uoo can be given 
in a more explicit form for suitable choices of / and suitable distributional assumptions 
on the underlying random field Y. A related CLT result was provided by [10] for 
random measures associated to germ-grain models. 
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4.2. The non-ergodic case 

If 4> is non-ergodic, consistent estimation of summary statistics generally requires 
multiple realizations of the process. Let P and A denote the probability law and 
the ergodic mixture measure of $, respectively. Then, drawing iid realizations of $ 
corresponds to drawing ergodicity classes according to the mixture measure A. Though, 
a finite collection of realizations merely approximates the mixing measure A and we 
can only expect consistency if both n and T tend to infinity simultaneously. To see 
why n — >• oo is not sufficient, consider an MPP with infinitely many ergodicity classes 
Qi,Q2, - - and with E$|q = q.$([0, 1]) = 2~\ Then, for fixed T, the probability of 
observing at least one point in a realization that belongs to class i tends to zero as 
i — > oo. Hence, the classes Qi, for i large, are only captured by the estimator if T also 
tends to infinity. 

Considering iid realizations $i, . . . , $„ of $, different possibilities arise of how to 
put together the respective estimators. Let w = (wx, . . . , w n ) denote a vector of weight 
functions Wi : Mo x [0, oo) d — > [0, oo). We assume that for A-almost all ergodic MPP 
laws Q* there exist constants w*(Q*) > with w*(Q*) = £™ =1 w*(Q*) > to which 
the weights converge stochastically within the respective ergodicity class, i.e., 

P*|q=q. (K($,T)-<(Q*)| > £ )— >0 (T^oo) (12) 

for all e > 0. Then we consider estimators of the form 

^ ht (/, W) = A?' WSht (/, W, ($!,... , * n ), T) 

-1 " 

= (22wi(*i,T)) T )M 7 >**> T )> ( i3 ) 

Note that the functions Wi might also depend on /. With wi = . . . = w n = n~ , we 
obtain as a special case 

n 

fi](I) = ($!,..., $„), T) - n" 1 J2 A/C> **, T). (14) 

i=l 

In order to estimate fj\ (I) consistently, according to the decomposition in ©, the 
weights have essentially to be chosen as 

¥= 

w l (<£ l ,T)=&W(C(T,I),<S> l )/v T = J2 Ht^C{T,i)/vr, (15) 

ti,t2e*», g 

where vt is the volume of the cube [0, T]. By Proposition IA.11 (C(T, I), $i)/«T 

(2*) 

converges to a^Q = Q.(C(I)) a.s. as T — )• oo, where Qi is the realized ergodicity class 
of <!>,;. With w being the vector of weights from (fT5|) . we define 

($!,..., $„), T) = A r ;' wgM (/, w, ($!,... , $„), T), (16) 

which, in a sense, represents the family of all pairs of points with a distance contained in 
I from all realizations. This choice of weights satisfies the above stochastic convergence 
condition (1121) and is sufficient but not necessary for consistency. The following theorem 
gives a weaker set of conditions that is still sufficient for consistency 
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Theorem 4.2. Let $„ i e N, be iid copies of a possibly non-ergodic MPP $ and let Qj i 
denote the respective ergodicity classes. For weight functions Wi : MoX [0, oo) d — > [0, oo) 
and iid random factors Wi with E|Wj| < oo, i 6 N, let Wi($i, T) = Wi ■ Wi(&i, T) and 
w = (wi($i,T),...,w„($„,T)). Then, ty™ u fT ~A „™„,„^ t™. .Wi 
following conditions hold: 



(J, w) is consistent for jif (I) if the 



Wi>0 
V&rwi($i,T) < ci 



a.s., 

for some c\ > 0, 



n 1 E'y~^ Wi > C2 > Vn > no for some uq 6 N, 
»=i 

E [W-tSj ($i , T)] = E [Wi] • E [wi T)] 



(17) 
(18) 

(19) 

(20) 



E 



'wi.o«(C(T,J),$i)MS|, 



=0 JI) =E[Wi]-E 



(C(T,/),$,)^ |Q=Q3 .U) 

I (21) 
(n,T^oo) (22) 



max 

8=1 



«( 2 )(C(T,/),$ i )E; =1 iiiift,T) 
Proof. We consider 



> c 3 



< 



EILx Wi(*i,T) [/!/(/, *<, T) - M^i |Q=g . (/)] 



£? =1 T) 



E?=iW i «) i ($ i ,T)^ |Q=Q . (7) 



(23) 



(24) 



By Lemma [4TT1 /}/(/, 5>i, T) is consistent (for T — > oo) within the respective ergodicity 
class. Thus, (f2"3")l converges to in probability if T — 5- oo. Using the short notation 
ctj = a (2 )(C(T,I), $j) and m; = T), we have 



121 



ELi Wjay 



< max ■ 

i=i 



En 
7 = 1 "J 



En 
5=1 



Ei=i^ 



OH E™=1 11 >, 

En ~ 
.7=1 W J 



En 
.7 = 1 



E? =1 ^ 



J 3 



YZ=iWiai[»% lQ=Qji (I)-tf>(I)] 



(2), 



Since by assumption, (n _1 E E"=i ^i)neN is eventually bounded away from and 
the variance of the Wi is uniformly bounded, the law of large numbers yields that 
E?=i ^j/^Ej=i and E?=i ^j' ly j/^'Ei=i WjM'j converge to 1 in probability. Ad- 



ditionally using that E[W 



EWjETij, for n — s> oo, we get the convergence 



<7'=1 



E-=i%7EE-=i 



EE"=i% 



1 



E"=i ELi E -=i e E -=i Wjwj 



iWi 
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as n — >• oo. Similarly, for n — > oo and n, T — >• oo, we have 
gU WQaj E[Wiai] 



E 




(2) 




E 


(2) 


(CV))_ 





respectively. Together with f|22[) we obtain that (|24[) converges to in probability, 
which completes the proof. 

Note that if Wi = w for all i € N for some weight function w with E|u)($,T)| < oo, 
the w)i(<J>i,T) are iid and conditions (IT51) . (fTTJ)) and (PD|) become obsolete. 

Now we turn to the estimation of p}?'(I). By construction (cf. Definition 13. ip . 

(if (I) consistently estimates Jx\ (I); in contrast to p." (I), it reflects a random pair of 
points with distance / within a randomly chosen ergodicity class. Again, also other 

(2) 

choices of weights are feasible for consistent estimation of (if (I) , apart from the choice 

u>i($i,T) = 1. By replacing a® (C(T, I), $j) by the constant 1 in Theorem we 
get the following corollary. 

Corollary 4.1. Under the assumptions of Theorem \4-£\ with d*- 2 ' (C(T, /), $i) fremg 
replaced by the constant 1, /ty' wsht (J, w) is a consistent estimator for /ii (J). 

Remark 4.1. If $ is ergodic, /t^' wsht (/, w) is consistent for fj,y(I) ( as T — ► °°) f° r 
any choice of weights w that satisfies (|12j) . Note that in this case, consistency is 
independent of n, which can be fixed to any finite value. 

Proof. If $ is ergodic, the mixing measure A is the one-point distribution dp and 
condition (fT2|) simply means stochastic convergence of the weights w.r.t. P. The 
assertion directly follows from the continuous mapping theorem. 

4.3. Variance minimization 

(2) 

In what follows, we seek for an optimal consistent estimator for (if (I) in the sense 
of minimal variance. We introduce some additional assumptions on the mark-location 
dependence for analytical tractability. For simplicity, we set u>j(<I>j,T) = 1, i.e., we 
consider Wi($i,T) = Wi. Let A* n denote the cr-algebra generated by the unmarked 
ground processes $i, g , ■ ■ ■ , 3> n ,g) i- e -, -Ki = ^({{^ '■ ^i,g( w )(-B) = k} : k G N, B e B, i — 
l,...,n}). We assume that E[/t/(7, $,,T) | ^4*] is a.s. constant. We further assume 
that A* n is maximal w.r.t. this property and that Var [(if (I, <E>, T)| A„] is independent 
of the random ergodicity class Q. 

Proposition 4.1. With the above notation and assumptions, the variance minimizing 
weights for /t^' wght (7, w, ($i, . . . , $„), T) that satisfy [F7\)-(M§ with (C*(T, 7), $ 4 ) 
being replaced by 1 are given by 

Wi($i,T) =Wi = Var [/},(/, ^,T)| A^ 1 . 

Note that an analog variance minimizing procedure via random factors W% could also 

(2) 

be included into the estimator (ij of [A (I). 
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Proof of Proposition^!^ For general „4* -measurable weights Wi($i, T), i = 1, . . . , n, 
we have 



Var 



£?' wght (I, w, (*!,..., * n ),T) 
1 



E 



Var 



(E^(**,T)) , , 

1 



53 T) 2 Var [/!,(/, *„ T)| ^ 



53 T)E[A/(/,*i,T)| ^] 



n 

53< 1 ($ i ,T) 2 Var[/i / (7,$ i ,T)|^;] 



(25) 



with i^' cl ($ 4 ,T) = to,-(3>j, T)/E"=i ^i($i,T). Since any weighted average Xi^, 
with > and — 1 i s minimized by Wj = a^ 1 / X) (Lagrange method), the 
unconditional variance (I25[) is minimized by choosing 

iUi($i,T) = W i = Var[A/(/,*i,T)|^;]- 1 . 

The Wi are A* n -measurable by definition of the conditional variance and satisfy (fTT)) - 
(f2"2l with d( 2 )(C(T, i"), $j) being replaced by 1. Maximality of A* n ensures optimality 
of the weights. 



If there exist interaction effects in the MPP that are of higher than second order, 
the assumption on E[/t/(J, T) | .A* ] might not be satisfied anymore and weighting- 
according to the above conditional variances should be handled with care. Clusters 
of point locations which tend to increase the conditional variance of fif given the 

ground process, can additionally influence the mean of other marks in excess of the 

(2) 

bivariate interaction measured by fj\ (I). Then, a bias will be introduced by using the 
above random weights. More generally, the more is known about the relation between 
///(/, $,T) and the ground process <fr g , the more can be gained from using different 
(random) weights while preserving consistency of the estimator. Without any assump- 
tion, only deterministic or independent weights are feasible and then u;j($j,T) = 1 is 
naturally the best choice, i.e., the use of //"(/). 

We consider two simple examples of optimal weighting in the following. Here 
we assume that the 2-components of the marks are 1 for all points. Recall that 
$,T) = &/(/, $,T)/&i (/,<!>, T), that the denominator is A* n -measurable, and 
that &f(I, T) is a sum consisting of ai(I, $i, T) random summands. 

Remark 4.2. In general, the summands of $j,T) are not iid. However, if 

conditionally on A* n , the summands were iid with variance v, the conditional variance 
Var [£/(!, $,T) | A* n ] would be u/di(J, $ i( T). 

In the following scenarios, we assume / to depend on its first argument, only. The 
proofs are given in Appendix [Cl 

Example 4.1. Let $ have marks that are stochastically independent of the process 
of point locations and let these point locations be fully regularly spaced in every 
realization. Let vt and N = N(T) denote the volume of [0, T] and the random 
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number of points in [0, T], respectively, and assume that the f(jji), i6Z, are iid with 
variance v. Then, asymptotically, Var [/!/(/, <f>, T)|.A*] ~ v/N and the resulting weights 
are = Ni/v, where iVj denotes the number of points within the z-th realization. 

Since N(T) is usually much smaller than <£>, T), the variance Var[/x/(7, <&, T)|^4*] 
in Proposition 14. II is larger than the one in the hypothetical example in Remark l4.2l 

In the following example, we consider arbitrary point locations but still assume 
independence between marks and locations. 

Example 4.2. Let $ be a one-dimensional, stationary unmarked point process and 
Y a stationary continuous-time process which is independent of $ and such that f(Y) 
has finite second moments. We consider the MPP $ = {(t, Y(t), 1) : t G Then 

Var[A/(/,$,T)|^;] 

_ E tie $ E n[o,T] H ie * g n[o,T] Cov[/(F(ii)), /(F(si))]n(ii, $ g , I)n(si, $ g , J) 



1 2 ' 

Eti6* g n[0,T] n (*li$g>-0 

where n(ti, $ g , 7) = Et 2 e* g \{ tl } Ifa-fce/- 
4.4. Remarks 

Remark 4.3. The weighting of multiple realizations and the intrinsically weighted 
means coincide in the following sense: Let $i, . . . , $„ be iid copies of an MPP $ = 
{(U, Vi, 1) '■ i G N}, for which the second mark component equals 1 for all points. Then 
the weighting of realizations via Wi(<f>i,T) in the estimator (fT5|) can alternatively be 
captured by the second mark component. For i = 1, . . . , n, let $j = {(t, y, w- ol ($i, T)) : 
(t,»,l) 6 where T) = T)/ ££ =1 iy fc ($ fe , T). Let be the 

concatenation of the processes $i, . . . , $„, each restricted to the observation window 
[0, T] and concatenated with a buffer of max(I) and such that all points of ^ n are 
contained in [0, T„] for some T„ G M d . Then, with w = (wi(<&i, T))f =1 , we have 

/}/(!, T„) = A/ wgh V, w, ($i, . . . , $„), T). 

(2) (2) 

We close this section with a note on the estimation of fij (r) and JiS (r) , r G K. 

Remark 4.4. For most MPPs used in applications, finding two points of an MPP with 
a fixed distance r within a bounded observation window, has probability zero. Then 
the simplest approach is to apply any of the estimators (TTTI) . (TT3]), (TJ3} or (Til)]) , with J 
being a small interval containing r, e.g., [j — 6, r + 6] for some 8 > 0. This is equivalent 
to use (Nadaraya- Watson) kernel regression with the rectangular kernel, applied to 
the tuples {(zif(yx), dist(t 2 - *i)) : (h,yi,Zi),(t 2 ,y2,Z2) G $}, where dist(x) = x if 
KEl 1 and dist(x) = ||x|| if x G K d with d > 1. 

An obvious generalization is to replace the rectangular kernel by a general kernel Kh 
with bandwidth h. For the basic estimator this yields 

A/(r $ T) = St,yMi).(fa,V2,« a )^ tie[Q.ri z if(Vi) K h(r - dist(t 2 - fj) 

likewise for the other estimators. If the support of TO,, covers the whole real line, the 
denominator is always strictly larger than zero, which simplifies implementation, but 
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also allows /t/(r, $,T) to be driven by pairs of points whose distance differs largely 
from r. 

5. Application to continuous-space processes 

Picking up the introductory example on continuous-space processes, taking measure- 
ments from such a process with measurement locations that are possibly irregularly 
spaced but independent of the underlying process, leads to a subclass of MPPs. At the 
same time, particularly developed in the geostatistical context, there exist numerous 
methods of inference for continuous-space processes, including methods to account for 
biased and preferential sampling. We compare the concept of intrinsically weighted 
means of MPPs to statistical methods for continuous-space processes in the following. 

One of the classical problems in geostatistical applications (e.g., [4]) is prediction of 
averages from measurements {(ti, Y(ti)) : i = 1, . . . , n}, where {Y t : t £ T}, T C R d , is 
a latent second-order stationary random field. When predicting global moments of Y, 
redundancies in the data can be excluded via the spatial correlation structure, e.g., the 
best linear unbiased estimator (BLUE) forEY is well-known to be (1 / E _1 1)~ 1 -1 / S _1 Y ) 
where 1 = (1, . . . , 1)', Y - (Y(t x ), Y(t n ))> and £ = Cov(Y(U), F(i,))^. =1 (e.g., H 
p. 179]). More generally, any estimator that is linear in a transformation g of the data 
allows for assigning a different weight to each data point; then the estimator takes the 
form X)"=i z i9(Y(ti)) or Y^ij=i z ij9(Y{ti),Y(tj)) (similarly for higher-order moments). 
The weights Zi and Zij are supposed to capture the spatial or temporal pattern of 
measurement locations when statistical inference from irregularly spaced data is carried 
out. Similar weighting procedures are used for declustering and debiasing methods, cf. 

m. 

Assertion 5.1. Identifying the geostatistical weights Zi with the z-component of the 
marked point process <£> = {(ti, yi, Zi) : i € N} ; the estimator z id(Y(ti)) ofMg(Y) 

coincides with the canonical estimator for the weighted mean mark £tl , defined by ([2]). 

The geostatistical guiding principle of choosing optimal weights for aggregation of 
measurements adheres to the idea that a) there exists an underlying random field and 
b) that this field can be measured at any location without causally influencing the 
other measurements. It is important to note that this is far from being satisfied for 
processes in which the measurements reflect physical objects that interact with each 
other. Trees in a forest, for example, compete for resources and if another tree had been 
added at some point, the measured characteristics of the surrounding trees would have 
likely changed. Though, with increasing distance, interaction effects between single 
objects of an MPP may become negligible and the random field assumption might be 
sensible on a larger scale. This perspective motivates combining classical mean mark 
estimators for MPPs of the form $ = {(ti, yi, 1) : i G N} with a geostatistical weighting. 
Partitioning the observation window in smaller parts, we assign a z-component to $ 
such that Zi = Zj whenever ti and tj belong to the same cell of the partition. This 
leads to a classical unweighted average within each cell and therewith maintains the 
information contained in the small-scale pattern of the point locations. Between the 
different cells, we allow for a weighting in the geostatistical sense and therewith allow to 
smooth out large-scale irregularities in the distribution of point locations. We denote 
the resulting estimator by //^' sco . 



Intrinsically Weighted Means of Marked Point Processes 



17 



Assertion 5.2. Considering a realization of $ as a collection of realizations of a 
possibly non-ergodic MPP on smaller observation windows corresponding to the above 
partition, the form of fi^' gc ° coincides with that of fi r f and fi^ ,wght , which estimate the 
average mean mark fi^ (see Definition \3.1\) instead of the classical mean mark t^p- 

The application of such a weighting scheme is particularly of interest when the under- 
lying process jumps between different regimes that differ substantially from each other, 
e.g., w.r.t. the intensity of point locations. In summary, applying the geostatistical idea 
of declustering in the MPP context in a sense corresponds to the concept of non-ergodic 
modeling. 

To avoid possible confusion, we conclude this section with a final remark. 

(2) 

Remark 5.1. For certain choices of /, the random field counterpart of ^ is well- 
defined. For /(2/i,£(2) = yi J/2 , for instance, the counterpart is the ordinary (non- 
centered) covariance function. If / only depends on one of the two marks of a pair 

(2) 

of points, [if implicitly conditions on the existence of other points and there is no 
sensible way of interpreting a suchlike statistic in a random field context, where there 
exist values at all points of the index space. Nevertheless, the geostatistical idea of 

(2) 

variance-minimizing weights can be applied to ^ by a simple mean squared error 
approach. 

6. Discussion 

The MPP summary statistics considered in this paper are (weighted) mean marks. 
In practice, the choice of weights is not always clear, for example when data from 
different stochastic sources are combined. In Section we point out that, if there 
was an underlying continuous-time process from which the data were generated by a 
random sampling procedure, then the mean of interest would rather be the temporal 
average over the whole index space instead of the average over all sampling locations. 
The weights might then be chosen to compensate for the irregular distribution of 
point locations. Though, the assumption of a continuous-time background process is 
problematic if the points represent physical objects that influence each other. Then, 
the mean of interest might include the randomness of the point pattern, as it is reflected 

(2) 

by the MPP moment measures . 

Related questions arises when multiple realizations of a non-ergodic MPP are con- 
sidered: Should the definition of mean include possibly different intensities of points 
between different ergodicity classes or not? A non-ergodic MPP can be seen as a 
hierarchical model and expectation functionals w.r.t. the point process can naturally 
be replaced by two-step expectations by averaging within each ergodicity class first 
and then aggregating the different classes (cf. Section [3|) . This alternative definition 
filters out the differences w.r.t. the point location patterns between different ergodicity 
classes. Which definition of mean should be chosen eventually depends on the purpose 
of the characteristic at hand and on the intended interpretation. 

Appendix A. Ergodic theory 

Ergodicity is a mixing property that can be defined in the very general context 
of dynamical systems. A MPP on R d together with the group of R d -indexed shift 
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operators is a special case of a dynamical system. 

We denote by Mo the set of all locally finite counting measures on R d x R, and by 
Mo the smallest er-algebra on M that makes all mappings M — > No U oo, ip f(S), 
measurable. Formally, a MPP $ is a measurable mapping from some probability space 
(tl,A,P) into (Mo, .Mo) and we can identify (tl,A) with (Mo, M.o) in the usual way. 
Let X = \T X : x £ R d } with 

(T x <p)(BxL) = <p((B + x),L), BeB d ,LeR. (26) 

Recall that $ is said to be stationary if the induced probability measure P* is T- 
invariant. Further, a stationary MPP $ is called ergodic if P*(A) is either zero or one 
for all T-invariant sets A £ Ado- Let Ao C A4o be the sub-a-algebra of all T-invariant 
sets in M , i.e., A = T~ x A for all A £ Ao and T £ T. 

The following theorem is commonly termed pointwise or individual ergodic theorem 
in literature and establishes almost sure convergence of a certain average of values of 
a random variable X. 

Definition A. 1. (Def. 12.2.1 in 171.) An increasing sequence of bounded convex Borel 
sets W n C R d is called convex averaging sequence in R if the maximal radius of a ball 
contained in W n goes to infinity if n increases. 

Theorem A.l. (Prop. 12.2.II [7].) Let (Q,A,P) be a probability space and T = {T x : 
x £ R d } a group of measure-preserving transformations acting on (il,A,P) such that 
the mapping (T x ,u>) i— > T x u is jointly measurable, i.e., (£>(T) ® A, A) -measurable. 
(Multiplication in T is given by T x T y = T x+y .) Let {W n }n&l be a convex averaging 
sequence in R d and Aq the a-algebra of T-invariant events. Then for all real-valued 
integrable functions X on (f2, A, P) 

^« = ^7T / X(T x u)v{dx)^E(X\A ) i n^w. 

If X is additionally L p -integrable, then K(X \ Ao) is also the L p -limit of X n . 

Remark A.l. If P is ergodic (i.e., P(A) e {0,1} VA £ Aq) then E(X \ Ao) reduces 
to the constant EX. Loosely speaking, this means that a suitable average over trans- 
formations of a single realization converges to the expectation over the state space fi. 

While Theorem IA.1I refers to a general probability space with a general group of 
transformations action on it, the following Proposition relates this results to the context 
of MPPs on M. d , in which the transformations T x , x £ M. d , are given by shifts of the 
whole point pattern by the vector x. Here, the point is that the index x £ R d has 
a direct geometric meaning when T x is applied to a realization ip of <£>. This yields 
convergence of spatial averages within a single realization of the MPP to the state 
space mean. 

The proof of the following Proposition is based on a simple sandwich argument, 
which can also be used for other consistency statements. We include the proof here, 
because to our knowledge, it is not available in this form in pertinent literature. A 
similar assertion can be found in [7J Thm. 12. 2. IV]. 

Proposition A.l. Let $ be stationary and ergodic and T as in Theorem \ A.1\ Let 
f : R d xRxM — > R be a non-negative function that satisfies f(t—x, y, T x (p) = f(t, y, <p) 
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for all t, x € G K, and i/iai is integrable w.r.t. to the marked Campbell measure 

C(B xLxM) = E[$((S n [0, x L)1 M ($)] , B € B d , Le £, MeM . VKe define 
random variables X, X n : Mo ->R it/ 

= E /(*.!/» V) 

{t,y)£<p, t£[0, l] d 



X n (if) = \ ^2 f(t,y,ip). 



(t,y)£<p, te[0,n] d 



TTien X„ converges to E*A" almost surely if n — >• oo. 

Proof. An extension of the classical Campbell theorem (e.g., Lem. 13.1. II in [7]) 
guarantees that E\X\ < oo if / is integrable w.r.t. the Campbell measure. The W n = 
[0,n] d obviously form an averaging sequence and 

«hL <27) 



(t,y)etp, tew n n[x—i,x] 



where x ± 1 for x S K d is defined component- wise. Note that the integrand on the 
RHS equals whenever W n fl [x — 1, x] — 0, which means that x is not contained 
in W n © [0, l] d , which is, on its part, a subset of W n +i- Thus, we can shrink the 
region of integration to W n +i without changing the integral. If we then drop the 
condition H € W n ' under the summation sign, we enlarge the whole expression since / 
is non-negative, i.e. 



X n (<p) < I [ V f(t,y,<p)v(dx) 

V(Wn) Jw "+Ht,y)e<p, te[x-i,x] 

' / E f(t,y,T x -i(p)v(dx) 

• /w/ "+ 1 (t,y)£T x _ lV , {£[0,1]-* 

X{T x ip) v(dx), (28) 



1 

~ v{W, 

= I/(W n+ i) 

v{W n ) v(W n+ i) J Wn+1 - 

where the second equation uses that f(t — x, y, T x ip) = f(t, y, ip) and the last equation 
uses that v is shift-invariant. Since the ratio v(W n +i) / L>(W n ) converges to 1, Theorem 
I A. II yields that the RHS of ([28]) converges to E*(X | A ) for almost all ^6i . Since 
$ was assumed to be ergodic, this conditional expectation equals E*X. 
Similarly, if we restrict integration in (|27[) to the set W n -\, we reduce the value of the 
integral. Since W n -\ © [—1, 0} d C W n , we can again drop the condition H <E W„' under 
the summation sign and by the same argument as before, we have 

X n ^)>TJ^F-. I E f{t,y,y)v(dx) n -±? E*X 



v(W n ) 



Wn ~ 1 (t,y)e<p, te[x,x+i] 



for almost all ip £ Mo. Thus, we have a sandwich relation for X n (ip) and can conclude 
that X n -> E*X a.s. 
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Note that the convex averaging sequence {[0,n] d }„ 6 N in Proposition IA.ll can be re- 
placed by any sequence {W © nV} H £^ with W a bounded Borel set and V C R d a 
convex and bounded set with v(V) > and G V. 

In case that $ is not ergodic, the following results provide a representation of 
$ as a mixture of a set of ergodic MPPs. To this end, let V (V e i S resp.) denote 
the set of all probability measures on (Mo, .Mo) induced by stationary (and ergodic) 
MPPs and let n crg be the smallest er-algebra making all mappings V elg —> [0,1], 
P i V P(A), measurable. We say that T fulfills the condition (LocCompGrp) if T is 
a locally compact, second-countable Hausdorff group of jointly measurable, surjective 
transformations. 

From [3] we can extract the very general result 

Theorem A. 2. Let be a measurable space with fl a complete separable metric 

space and A its Borel-a- algebra. Let T be a set of measurable transformations of fl 
satisfying the condition (LocCompGrp) and let P £ V. Here, V (Perg resp.) is the set 
of all T '-invariant (and ergodic) probability measures on (f2, A). Then there is a unique 
probability measure Xp on (Perg, n crg ) and a P CTg -valued random variable Qp s.t. 

P(A)= [ Q(A)\p(dQ)= [ Qp(lo)(A)P(cLo) VA e A, 

i.e., Xp is the distribution of Qp. 

In the context of MPPs on R d , the group T of shifts, as defined in ([2"5|) . obviously 
fulfills the condition (LocCompGrp), and since Mo is a complete separable metric space 
and M.q its Borel-a-algebra (e.g., [13]), Theorem IA.2I can directly be applied, which 
yields a decomposition of the non-ergodic MPP $ ~ P: 

P{M)= [ Q(M)X(dQ) VMeM . 

Note that each Q induces a new ergodic MPP $q : VL — > Mq which is given implicitly 
by P($q e M) = Q{M), M £ Mo- By the second representation in Theorem [A.21 we 
can also consider Q as a random variable on (Mo,A^o>-P) with distribution X = Xp. 
Thus, $ and Q* have a joint distribution and the conditional distribution of $ given 
Q is well-defined: 

P(-\Q = Q*) = Q*(-). 

Appendix B. Proof of Theorem [47T1 

The following lemma generalizes the classical individual ergodic theorem [3 Prop. 
12. 2. II] to a situation in which the thinning of the point process depends on the size 
of the observation window. 

Lemma B.l. Let $ be a stationary and ergodic MPP on R with real-valued marks 
and let (ur)r>o be a family of non-negative non- decreasing numbers such that 



(29) 
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Then, for T — > oo, we have the almost sure convergence 



1. 



Note that the almost sure convergence (AT) T) — > 1 as T —> oo follows from 

the classical individual ergodic theorem (e.g., |T, Prop. 12. 2. II]). 

Proof of Lemma \B.l[ With g u (y) = 1 — f C ond,u(y), y£R, we obtain the almost sure 
convergence 

a fl „ r (/,<fr,r) ;i 

TE $ a g „ T (/,$,l) ^ 

from [7J Prop. 12. 2. VII] and the subsequent remarks. Further, A = E$di(i, <E>, 1) = 
E$d/ cond ,„ T (I, 1) + E$d 9 „ T I)- Hence, 

a/„ d .„ T T) &!(!, $, T) - a (I, $, T) 



™$d /cond ,„ r (I, *, 1) TE $ d /cand _„ T (/, $, 1) 

\Ql(/,*.T) » (T if. 1\ & 9v T C>*> T ) 

_ A AT' - E * a 9. T ( J ' $ > ^ TEsc^ (/,*,!) 

~ WWW) 

and the RHS converges to 1 as long as E$di./ cond ut (I, <I>, 1) converges to at a slower 
rate (in the sense of (f29|) ) than ai ^^' T ^ and T g^a T "~~f7^T) a PP r oach 1. 
Proof of Theorem \4-l\ We have 

&* Ut (I, *, T) _ (I, $, T) V /[X^T] v/A^ 



and by Lemma IB. 11 the last factor converges to 1. (Here, for a > 0, [a] denotes 
the smallest integer > a.) Hence, for convergence of the LHS it is sufficient to 
show that &f (I, $, T)/y/[\ UT T] converges to a Gaussian variable. According to [131 

Lemma 2.1, Lemma 2.3], we can write $ as a sum of Dirac measures Srr it Yi)> * ^ ^> 
with random vectors (Ti,Yi) and T\ < T2 < . . . If only a finite observation window 
[0, T] is considered, the number of summands N(T) is also finite but random. Then 
we introduce a modified version of a** (/, T), in which the sum is cut after a fixed 
number AT max £ N of terms: 

N(T) AT(T) 
i=l 3=1 

' 1 [E^ri E^L? /co n d, U m)lT.,-r., e i+E^ =1 /co„d, U (Y' S )lT.,-T ie i<A' m ax] " 

Then we have 

d} (/,$,r) d;f u - T1 (/,$,oo) a% (/,$,r)-d;' [A - T1 (/,<i>,oo) 

J ^ = - + 1 T (30) 
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and the first summand of the RHS contains a non-random number of summands 
(namely [X UT T]). By the minimum distance assumption in condition (m-dependent 

* [A T] 

Random Field Model), each mark occurs at most |/|/do times in &/' " T (J, oo). 
By the finite-range assumption on the covariance function of the underlying random 
field, the sequence (li)ieN is [/io/c?o]-dependent. Hence, the sequence of summands 

* [A T1 

in a J " T ($,1,00) is [|/|ft-o/do]-dependent. By assumption, the first four moments 
of the excesses Zi — [f UT (Yi) \ f(Yi) > u T ] exist and converge to some constant in 
(0, 00) as T 



00. Then the sequence of summands in d?' A " TT ' ($, /, 00) satisfies the 



assumptions of Berk's CLT for triangular arrays of m-dependent random variables [2] 

(J, <£, 00)/ \J[X UT T] approaches a Gaussian distribution 



and thus, for T — > 00, a 



fu T 

with zero mean and variance 



= lim Var 



Atf^V^oo)! /({\ UT T}). 



Next, we show that the second summand in (|30|) converges to in probability. We 



use the notation Aa f u — d^ (I, 3>, T) —a 



*,[A„ T T] 



(I, $,00) and Aai = d/ cond „ (I, T)- 



Vco. 



(I, $, 00) and consider 



¥(\Aa fuT I > e^\X^T]) 

= f( 



\Aa fuT I > £\J[\ UT T] 
Aa h 



|Aai| >e[A UT T" 



•(|Aai| >£[A„ T T]) 



,| > ey/[X UT T] |Aai| < e[X UT T}j -P(|Aai| < e[X UT T}) 
< P(|Aai| > e[X UT T})+v[\Aa fuT \ > ey/[X UT T]\\A ai \ < s[X UT T]j 

[A T] 

Note that «/ c "J d J {I, oo) = [A UT T] and hence 



(31) 



Aai| >e[X UT T}) = P( d/ cond ,„ T (/, $,T)/[A Ur r] -1 



> e -> for T 



(32) 

To estimate the the last summand in (|3Tj) , we use again that the sequence (Yi)ieN 
is [/io/do]-dependent and that the number of points in any interval of length |/| is 
bounded by c = \I\/do. This means that each term f UT (Yi) occurs at most c times in 
the sum Aa / u . Obviously, the variance of Aa f v , or more generally all even centered 
moments of Aa/ , become maximal, if this boundary is bailed, i.e., if for a given 
total number Aai of summands, only [Aa\/c\ different Yi are involved. With Z* = 
Z, - EZi = IfuriY) I fQTi) > u T ] - e(u T ), where e(u) = E [f u (Y(0)) \ f(Y(0)) > u], 
we get 



F(\Aa fur I > e^[X UT T] I I Aai I < e[X UT T\) 
= p(|Aa/„ T | 4 >£ 4 [A Mr T] 2 \A ai \ < e[X UT T] 



< 



[elX^T]^ 1 ] 



1=1 



cZ* > e 4 [X UT T}- 
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[e[\ UT T]c-^] 

<c 4 J2 ^TO'(£ 4 [V^)"' 

i,j,k,l—l 

'ho^ " 

3 



< c 4 • [e^Tjc- 1 ] • (^j E [(Z x *) 4 ] • (£ 4 [A„ r T] 2 ) _1 
= (\ Ut T)~ 1 e~ 3 (c^j 3 E [{Zlf} (1 + o(l)) — ► 0, (T ^ oo). 



Plugging this and d3U) into (pffl) yields that Aq / u / ^/[A UT T] — > in probability. 



Appendix C. Proofs of Examples in Section [4] 



Proof of Example \4- 1\ For 7 and T large, we have di(7, $, T) ^ TV • N\I\/vt and 
each distinct summand in d/(7, $,T) occurs JV|7|/z>t ~ &i(I,&,T)/N times. Thus, 



a f (I, $ , T) ~ ax (7, $, T) Eti /(l/i)/JV and Var[d/(7, <f , T)| A* n } ~ di (I, $, T) 2 i;/7V 



Proof of Example \4-2\ We have 



Efd/C^^rvdiC/,^^)!^] 

= di(J, T) -1 • E ^(t 1 , yi ,2i),(t 2 ,i/2,z2)e*, tiG[o,T] z i/(2/i) ' U 2 -t 1& i A* 
= dx(7, a>,T)- 4 ■ E tie * g n[o,T] -#{^ G *g : t 2 - h e 1} • E [/(K(ti))|^ 
= E/(F(Q)). 



and 



T) 2 | A* n ] 

E /cn*i)/cn«i)) 



ti,siG* g n[0,T] 



#{i 2 G <*> g : t 2 - h e 1} ■ #{s 2 e $ g : s 2 - si e 7} 



4* 



E n(ti,$ gJ J)n(*i,*g,/) 

^••^*.n[o,ri . E [/(y(t 1 )/(y( Sl ))|^;] 

E n(ti,$ g ,7)n(si,$ g ,7) 

'"' • [E[/(y(o))K] 2 + Cov[/(y(t 1 ),/(r(si))l^] 
E n (*i» $ g< * BJ J ) • Cov[/(y(ii),/(y(si))] 

ti,siG* g n[0,T] 

+ (E/(y(0))) 2 d 1 (7,<i>,T) 2 . 
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Hence, 

Vax[&f(I,*,T)/a 1 (I,$,T)\JZ] 

= E[(a f (I, $, T)/&i (/, $, T) 2 | A*J ~ (E[a, (/, *, T)/ai (I, S, T) | ^]) 2 
= a! (I, $,T)- 2 $,T) 2 | A* n ] - (E/(F(0))) 2 

= a!(7,$,T)- 2 n(ti,*g,/)n(«i,* g ,/)-Cov[/(y(ti),/(y(si))]. 
ti,sie# g n[o,r] 
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